-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some improvements to loading the session with --prompt-cache #1550
Some improvements to loading the session with --prompt-cache #1550
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
cfdfc2f
to
f07993f
Compare
LGTM! Tested both features, works great! |
I like the seed fix. Personally, I would prefer it always overrides the seed with a random one if it's set to -1. If we want to keep the seed the same, we could pass it in each time. I don't see a need to keep the seed, but if you can think of a use case, perhaps we could check if seed was set to something like -2 and only then keep the original seed? Also, currently, prompt cache treats text given during |
I agree. The issue is The only way to fix that would be store something like a
I will check that and get back to you. Breaking any current behavior definitely is not intended. |
I meant, just always, even if they don't specify the seed explicitly. I was thinking since it's just "prompt" cache, the seed shouldn't be cached at all. Still, if you think the seed should only be overridden when explicitly set, you could set it to -2 before the |
I don't want to change the existing behavior too much. I'm a little confused though, I think it already works the way you want without any further changes. With this pull, if you specify any negative seed except for Also, can you please check and see if the change I just pushed fixes your interactive mode issue? (The behavior should now be identical to the master branch behavior.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested the PR and it does preserve the use with --interactive-first
. However, I found a bug related to your random seed code.
I don't want to change the existing behavior too much. I'm a little confused though, I think it already works the way you want without any further changes.
With this pull, if you specify any negative seed except for -1, that will make it use a random seed even when loading a session.
Sorry, I'm talking about two separate solutions; maybe that's where the confusion is coming from.
The first possible solution is to just always override the seed. Although your PR says otherwise, this is actually what your code currently does. Since lines 156 to 158 never do anything, they should just be removed. Simple solution.
This will mean even without --seed
specified, we update the seed. I think the seed not being updated was simply an oversight, not a design decision. To me, this is the expected behavior. Again, the argument being, it's only a prompt cache, not a full session cache.
The other suggestion is that -2
is the seed you ignore instead of -1
. This would maintain the existing behavior of --seed -1
. To accomplish this, you could set params.seed = -2;
before gpt_params_parse
is called on line 54. This is a bit messier to me, and I've seen people ask why the seed isn't being randomized anymore (which is probably not being done with --seed -1
). I also think it's a bit early to worry that people are depending on the seed not changing.
But either way, both of these solutions work as expected when the user specifies --seed -1
.
This comment was marked as off-topic.
This comment was marked as off-topic.
If I'm reading your post correctly, it sounds like you haven't even tried this pull request? I'd like to stay on topic here since it's confusing to talk about other stuff unrelated to my changes. I initially thought you were saying my pull introduced new issues. If you're just asking general questions about prompt caching behavior then creating a discussion in the Q&A section is probably most appropriate: https://github.com/ggerganov/llama.cpp/discussions/categories/q-a You could possibly create an issue instead but Q&A discussion makes the most sense to me. If you want to @ me when you do that, I can probably answer some of your questions. |
@DannyDaemonic Thanks for taking the time to test and catching that problem! Since
The thing is, it actually saves and loads the RNG state there. So apparently someone though it was important to preserve that state. I'm reluctant to change that behavior in something that's mainly supposed to be a bug fix. I've made some further changes. Can you please test again? Hopefully I actually got it right this time - so much for a simple fix! I don't love the approach of having to save the seed parameter before it gets overwritten but I'm not sure there's really a better way without making significant changes. This is how the RNG/session loading interaction is supposed to work with the latest changes:
|
No, I did not. I don't see any point to "fix things" in seed with |
Okay, well, hopefully you can appreciate how it would be confusing to start talking about different issues in a pull request aiming to fix some other problem. Also, there's obviously a benefit to fixing existing problems even if it result in something that works perfectly for every use case. At the least, you're closer to a state where it will do what you want. Fixes and improvements are an incremental process, it's not all or nothing.
Maybe it's broken for your specific use case, but that doesn't mean it's broken/useless altogether. From your description, it seems like you're doing something unusual and pretty specific. Anyway, my advise is to create a discussion. Like I said, if you @ me I will try to help you out by answering some of your questions there. |
I just created a separate issue: #1585 BTW, your PR is titled |
This is using a feature in the llama.cpp API that stores and loads the entire state. This is so you can do things such as save states for a web server where you're having conversations with many people at once. Everything is ready to go from the stored state so you can just swap in whatever state you need and evaluate the next response. In this case however, the state is only being used as a --prompt-cache. Restoring things blindly is also causing issue #1585. It's probably also causing other settings besides |
I'm not fully convinced that's in issue. At least, that person hasn't show an actual problem there yet. But one of their problems seemed to be that you don't get the same sequence of tokens when restoring from the cache with a prompt that's a prefix of the saved one compared to when the prompt cache was initially saved. Not restoring the RNG state would still cause that "problem". I think the only way around that would be to store the RNG state after every token had been generated. Even then, it probably wouldn't be reliable because stuff like different sampling settings could result in a different number of random numbers generated per token (maybe?). I'm also a let less sure about making changes to the llama.cpp library itself, because that's going to affect all API users, all the other examples, etc. I don't really have the familiarity with this project to predict all the effects, I wouldn't really want to mess with it there. That couldn't fix the problem by itself anyway, since the |
CharacterAI chat: https://github.com/Cohee1207/SillyTavern/blob/999a94718df39227fc5ea2bc80006351f43c5a88/public/instruct/WizardLM.json#L3
Then it will print something like: And I need to craft this one:
This is not "adding to the end", it should be pasted after "User:" but before "### Response:", which means that I need to restore session from there somehow. For example, by cutting after last user input and fake-run just to save session to a separate file; then put Response-tail template and save another session derived from it; then regenerate at different seeds (if that would be possible of course!) or settings until I satisfied with the answer; then add it to the prompt and change the initial session the same way. That requires my wrapper to have FULL control over user prompts, and tracking which one related to each session copy. The best behavior would be if I won't even care about sessions, just give it a file and then work with the prompt ONLY, so the main.exe will decide: will it use the cache, update it, discard it – without any effects on the final result, just as it was generated from scratch. Currently, when given shorter prompt than in session – I get irrelevant replies, not just randomized! Roleplay just breaks if I change prompt arbitrary when using the same session file without copy-swapping it manually.
Aren't the "state" (context) is stored? Cache files are large, I presume they have all internal variable memory of the model. |
I think I know what he's talking about. I've seen similar things myself. Try this:
Then try this:
The joke will start with a
The main example doesn't call I do think setting it here in main is a good idea. |
I can confirm that. Wow, really weird. I don't think my changes make that part worse but it definitely seems like an actual bug that should be looked at. I don't really think it's related to saving the RNG state though. With my patch and overriding the RNG and reseeding it, I still see that behavior.
Are you saying you think the current approach I'm using is okay after all? What's your opinion of the current state of this pull request. Do you think anything needs to change before merging? |
Absolutely not. My point was just because something is restored by the state doesn't mean we need to use it for our prompt cache.
I'm still leaning towards always replacing the seed, no matter what. (Again, it's only a cache.) At the very least, we should honor |
That's reasonable. I don't have any strong opinion here, I'm just reluctant to make changes to the behavior because I'm not familiar with the decisions that lead to stuff being there or not. If anyone with authority (which might be you?) tells me it should be a certain way then I'm happy to accept it and do it that way.
Probably the only benefit of the current behavior is that if you restart from the cache with the whole prompt, not a prefix then you actually will get the same sequence (assuming the same sampler settings and no other code changes that would affect generation).
That makes sense. The only thing I'd change is using something a user would be less likely to select than So the change would involve making the default value for seed in the params structure be Sound okay? |
I wrote the session/prompt cache feature, building on the state saving APIs as mentioned. Overall, I went for the simplest approach (both in implementation and CLI usage) to speed up prompt evaluation. I absolutely agree it needs to be iterated on, and my impression is that nothing about prompt (or event state restoration) is particularly sacred at this point. Here's my take on some of the points being discussed:
|
I don't know if there's really any kind of hierarchy around here, other than ggerganov has the ultimate say since he's the mastermind behind it all. His philosophy seems to be: have fun, don't be afraid of breaking changes, and iterate. I think he generally relies on everyone to discuss changes and come up with reasonable solutions (while avoiding drama). I wouldn't be too worried about changing the behavior yet. All of this is very new, and as ejones said, "nothing about prompt (or event state restoration) is particularly sacred at this point." I think it's also the simpler solution; just overwrite it every time and to treat the That said, if you are depending on this feature as is, then let's go with option B and leave the seed as unchanged by default.
At the start of this I almost suggested using |
Yes, that's well said I am paying less attention to the changes in the examples and trust that you will figure out what's right. |
waiting for this |
Thanks everyone that replied, appreciate the information!
I'm fully convinced! I made that change and also added a little note to the README mentioning that restoring a cached prompt doesn't mean you get the same sequence of tokens as the original generation, even when specifying the same seed.
Nope, I only cared about being able to try multiple seeds from a cached prompt without having to repeatedly specify it.
As a pessimistic, risk-averse type of person I fear that would require brain surgery... |
1. Currently the --seed parameter is ignored when loading the prompt. However, a very common use case would be to save a prompt and then try several attempts at generation with different seeds. 2. When loading a cached prompt from a session, you have to specify the prompt again. Even worse, if you forget to enter a prompt you'll get your cached prompt overwritten by the blank one.
Display some helpful information to the user when loading a session to make it clear when the seed applies or not.
Add a note in the main example README about how restoring a prompt doesn't imply restoring the exact session state.
3e32104
to
156d70b
Compare
--seed
is ignored when loading sessionCurrently the
--seed
parameter is ignored when loading the prompt. However, a very common use case would be to save a prompt and then try several attempts at generation with different seeds.The pull includes a simple change that just sets the RNG if specified. Two small notes:
-seed
was actually actually specified as far as I know, only that it's not the default-1
value. So--seed -1
is the same as not including it: it won't override the cached seed.--seed 123
and then load it with-seed 123
the subsequent tokens will not match. I don't think there's an easy way around this. It's not 100% ideal but still a lot better than just completely ignoring the parameter with no warning.Blank prompt overwrites cached one
When loading a cached prompt from a session, you have to specify the prompt again. Even worse, if you forget to enter a prompt you'll get your cached prompt overwritten by the blank one because of course it has low similarity.
This pull changes that behavior to simply use the tokens from the saved session if
params.prompt
is empty (in other words not set via--prompt
or--file
).Closes #1439